Local Semantic Indexing Based on Partial Least Squares for Text Classification
نویسندگان
چکیده
Semantic Indexing based on Partial Least Squares (SIPLS) is an effective feature extraction method for text classification. SIPLS integrates the global category information Y with the document-class matrix X to create the latent semantic spaces. However, the global latent space may not be the optimal one for each class. To solve this problem, the Local SIPLS (LSIPLS) method is proposed which creates one SIPLS space for each class. Without the influence of global information, the local discriminative components are convenient to be extracted in LSIPLS. Compared with global SIPLS, LSIPLS obtains similar performance with rather compact dimensionality. Empirical results on Reuter corpus prove that LSIPLS is a powerful tool for text classification.
منابع مشابه
Cross - lingual Information Retrieval Model based on Bilingual Topic Correlation ⋆
How to construct relationship between bilingual texts is important to effectively processing multi-lingual text data and cross language barriers. Cross-lingual latent semantic indexing (CL-LSI) corpus-based doesnot fully take into account bilingual semantic relationship. The paper proposes a new model building semantic relationship of bilingual parallel document via partial least squares (PLS)....
متن کاملEnhancing User Search Experience in Digital Libraries with Rotated Latent Semantic Indexing
This study investigates a semi-automatic method for creation of topical labels representing the topical concepts in information objects. The method is called rotated latent semantic indexing (rLSI). rLSI has found application in text mining but has not been used for topical labels generation in digital libraries (DLs). The present study proposes a theoretical model and an evaluation framework w...
متن کاملA Joint Semantic Vector Representation Model for Text Clustering and Classification
Text clustering and classification are two main tasks of text mining. Feature selection plays the key role in the quality of the clustering and classification results. Although word-based features such as term frequency-inverse document frequency (TF-IDF) vectors have been widely used in different applications, their shortcoming in capturing semantic concepts of text motivated researches to use...
متن کاملRole of semantic indexing for text classification
The Vector Space Model (VSM) of text representation suffers a number of limitations for text classification. Firstly, the VSM is based on the Bag-Of-Words (BOW) assumption where terms from the indexing vocabulary are treated independently of one another. However, the expressiveness of natural language means that lexically different terms often have related or even identical meanings. Thus, fail...
متن کاملLRLW-LSI: An Improved Latent Semantic Indexing (LSI) Text Classifier
The task of Text Classification (TC) is to automatically assign natural language texts with thematic categories from a predefined category set. And Latent Semantic Indexing (LSI) is a well known technique in Information Retrieval, especially in dealing with polysemy (one word can have different meanings) and synonymy (different words are used to describe the same concept), but it is not an opti...
متن کامل